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Abstract 

A Monte Carlo method to optimize cuts on variables is presented and evaluated. 
The method gives a much higher signal to noise ratio than does a manual choice of 
cuts. 



There are two important methods for refining a signal over background ratio: 
likelihood analysis and cut-based analysis. Likelihood analysis has the advan- 
tage of not discarding any potential signal. However, it is not as straightfor- 
ward to evaluate its statistical significance compared to the cut-based analysis. 
Cut-based analysis on the other hand, cuts away parts of the signal in order 
to reduce the background evenmore. 

Traditionally, variable-cuts have been sought with the help of good sense and 
some experimentation. 

In this letter I address an automatical method for searching for optimal cuts. 
I have used only one simulated set of data, fairly large, and many different 
backgrounds. The simulated data are heavy leptons with masses 100—200 GeV 
done at center-of-mass energies of 183 — 209 GeV to correspond to the OPAL 
experiment at LEP. 

The method is simple: Initially, determine which of the variables are most 
relevant and what their ranges are. If possible, find some minimum cuts that 
will leave the signal intact, while still reducing the background. This can sig- 
nificantly reduce the time spent on each iteration below. 

The cut optimization then has the following general algorithm: 

(1) Choose a random variable and change the cut randomly with a value 
between and T * max, where T is initialized as Tj = 100% and max as 
the maximum value of the variable. 

(2) If this change leaves us with a higher S / \/B -value, keep it, otherwise 
discard it. 
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(3) Decrease T and restart from the beginning 

A problem with this method is that it might get stuck in a local minimum 
somewhere. This can be remedied by storing the final cuts and the S/y/B- 
value and then reinitializing the process, iterating until a satisfying S/y/B- 
value is obtained. The method can be parametrized by AT, the change in T 
per iteration, T, the initial value of T and N it , the number of reinitializations. 

Our test case is described in general in [1] and in particular in [2]. A short 
resume follows here. The signal we are looking for is e + e~ — > vN — > vlqq 
and the main variables are the lepton energy Ei, the missing energy, E u , the 
invariant mass of / and is, the invariant mass of the N (= q, q, I) and the 
lepton type (I — e, /i or r). Both the signal events and the background events 
were subject to the full OPAL detector simulation [3] as well as some ba- 
sic cuts to ensure a good quality [4]. The miminum cuts mentioned above 
were set to Ei,E v > 5 GeV. The Monte Carlo generator EXOTIC [5] was 
used to generate the e + e~ — > vN signal. The following masses were sim- 
ulated M N = 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200 GeV and for 
each mass the energies E = 183, 189, 192, 196, 200, 202, 204, 205, 206, 207, 208 
GeV for all < E. The total number of signal events surviving the initial 
cuts were about 350 for each pair of (E, Mat). A variety of MC generators was 
used to study the multihadronic background from SM, see [2] and references 
therein. The relevant backgrounds are qq / ~y (KK2f+PYTHIA 6.125), Uqq, eeqq, 
qqqq, eerr (grc4f 2.1) and ^qq (HERWIG). 

The traditional cut based analysis left us with some ~ 5 — 15 signal events 
and ~ 5 — 10 background events, i. e., S/ y/B ~ 5. On the other hand, the MC 
based method often managed to completely remove the background, while still 
preserving ~ 50 signal events. There are several ways to improve the value of 
S/y/B but they all come at the cost of longer execution time. The different 
improvements were: 

• Use high T; value 

• Use smaller AT for each iteration 

• Increase the number of iterations, N. 

• Change more variables than one, before recomputing S/y/B 
For most of these improvements, the general behaviour was that 

-^~5.1x t a37 (1) 
y/B V 1 

where t is the time in seconds. The only exception was in increasing the number 
of variables, which was not profitable. The S/y/B is illustrated in Fig. 1. The 
values have been averaged over ten different optimization runs. For the dot- 
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Fig. 1. Comparison of changes in some parameters. The dot-dashed curve represents 
changing AT, the dashed and solid curves represent changing Nu with two different 
Tj = 20, 100 GeV and the dotted line is the approximate result of the traditional 
cut based analysis. 

dashed curve, the step AT is modified from 2 4 — and divided by two each 
time. The values of Tj = 20% and N it = 2. We notice that the curve is levelling 
out asymptotically. The solid and the dashed lines both have AT = 10 and 
the number of iterations goes from N it = 2 to 512, multiplied by two each 
time. Furthermore, Tj = 20% for the dashed line and Tj = 100% for the solid 
one. 
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